Warped Mixtures for Nonparametric Cluster Shapes
نویسندگان
چکیده
A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.
منابع مشابه
Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering
Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable by introducing a novel framework for clustering overfitted parametric (i.e. misspecified) mixture models. These conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In...
متن کاملDirichlet Process Parsimonious Mixtures for clustering
The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtur...
متن کاملIdentifying Finite Mixtures of Nonparametric Product Distributions and Causal Inference of Confounders
We propose a kernel method to identify finite mixtures of nonparametric product distributions. It is based on a Hilbert space embedding of the joint distribution. The rank of the constructed tensor is equal to the number of mixture components. We present an algorithm to recover the components by partitioning the data points into clusters such that the variables are jointly conditionally indepen...
متن کاملNonparametric Bayesian Clustering via Infinite Warped Mixture Models
We introduce a flexible class of mixture models for clustering and density estimation. Our model allows clustering of non-linearly-separable data, produces a potentially low-dimensional latent representation, automatically infers the number of clusters, and produces a density estimate. Our approach makes use of two tools from Bayesian nonparametrics: a Dirichlet process mixture model to allow a...
متن کاملA Nonparametric Multi-seed Data Clustering Technique
Clustering of data around one seed does not work well if the shape of the cluster is elongated or non-convex. A complex shaped cluster requires several seeds. This study developed a nonparametric multi-seed data clustering approach which splits and merges procedures to handle the complex shapes of clusters. The splitting process utilizes a genetic algorithm to search for the appropriate cluster...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1408.2061 شماره
صفحات -
تاریخ انتشار 2013